A Fully Compressed Pattern Matching Algorithm for Simple Collage Systems
نویسندگان
چکیده
We study the fully compressed pattern matching problem (FCPM problem): Given T and P which are descriptions of text T and pattern P respectively, find the occurrences of P in T without decompressing T or P. This problem is rather challenging since patterns are also given in a compressed form. In this paper we present an FCPM algorithm for simple collage systems. Collage systems are a general framework that can represent various kinds of dictionary-based compressions, and simple collage systems are a subclass that includes LZW and LZ78 compressions. Collage systems are of the form 〈D,S〉, where D is a dictionary and S is a sequence of variables from D. Our FCPM algorithm performs in O(‖D‖2 +mn log |S|) time, where n = |T | = ‖D‖ + |S| and m = |P|. This is faster than the previous best result of O(m2n2) time.
منابع مشابه
A Boyer-Moore Type Algorithm for Compressed Pattern Matching
We apply the Boyer–Moore technique to compressed pattern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compression methods. For a subclass of collage systems that contain no truncation, our new algorithm runs in O(‖D‖ + n · m + m + r) time using O(‖D‖ + m) space, where ‖D‖ is the size of dictionary D, n is the c...
متن کاملMultiple Pattern Matching Algorithms on Collage System
Compressed pattern matching is one of the most active topics in string matching. The goal is to find all occurrences of a pattern in a compressed text without decompression. Various algorithms have been proposed depending on underlying compression methods in the last decade. Although some algorithms for multipattern searching on compressed text were also presented very recently, all of them are...
متن کاملBit-Parallel Approach to Approximate String Matching in Compressed Texts
In this paper, we address the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of collage system, which is a formal system proposed by Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word,...
متن کاملMore Speed and More Compression: Accelerating Pattern Matching by Text Compression
This paper addresses the problem of speeding up string matching by text compression, and presents a compressed pattern matching (CPM) algorithm which finds a pattern within a text given as a collage system 〈D,S〉 such that variable sequence S is encoded by byte-oriented Huffman coding. The compression ratio is high compared with existing CPM algorithms addressing the problem, and the search time...
متن کاملCollage system: a unifying framework for compressed pattern matching
We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framew...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Found. Comput. Sci.
دوره 16 شماره
صفحات -
تاریخ انتشار 2004